Stanford Matrix Considered Harmful

نویسنده

  • Sebastiano Vigna
چکیده

The idea for this note arose during the “Web Information Retrieval and Linear Algebra Algorithms” held at Schloss Dagstuhl in February 2007. Many brilliant people working on either side (numerical analysis and web search) had a chance to meet and talk for one week about mathematical and practical aspects of linear methods for ranking, and in particular (not surprisingly) PageRank and HITS. There were many scientific aspects that were surprising for both sides and that were (finally, one might say) stated clearly at the workshop. First of all, PageRank is not the most important factor in Google’s scoring, or in any search engine scoring. It is part of literally hundreds of features that are somehow combined (e.g., by a standard machine-learning framework), and its importance has decreased in time (people in the so-called “search engine optimisation” industry claim that the importance of PageRank dropped drastically around 2003, but this claim is based just on reverse engineering). Yet papers are published every day starting with claims such as “PageRank is the most important ranking. . . ” and so on. Such statements give a distorted view of reality, but clearly work very well for publishing papers. On the other hand, it is also clear that PageRank (and, more generally, link-based ranking) is useful in other areas, such as deciding which pages to crawl and which not. Moreover, when one attacks the real PageRank computation problem suddenly many sophisticated methods devised by the numerical analysis community are pretty useless because of the very large size of the matrix. This problem actually had the very positive effect of stimulating a lot of new research in numerical analysis methods that could be applied to large, sparse, irregular stochastic matrices of dimension 10 and beyond. Much of the classical work on such large systems involved matrices with considerable structure from ODEs, PDEs, queuing theory, etc. Actually, many of the more promising approaches for PageRank computation use some kind of graph structural analysis (usually related to strongly connected components) that is rather meaningless in the classical problems mentioned above. These considerations bring us to the point of this note. The problem of computing PageRank is interesting from a practical viewpoint only if the size of the matrix is large and if the type of the matrix is a web graph. What do we mean by “large”? Currently, search engines claim to index a number of pages in the order of 10. We cannot expect, as scientists, to replicate exactly

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Almost Diagonal Matrices with Multiple or Close Eigenvalues

If A =D+E where D is the matrix of diagonal elements of A , then when A has some multiple or very close eigenvalues E has certain characteristic properties. These properties are considered both for hermitian and nonhermitian A . The properties are important in connexion with several algorithms for diagonalizing matrices by similarity transformations. *Mathematics Division, National Physical Lab...

متن کامل

A new mathematical model for intensity matrix decomposition using multileaf collimator

Cancer is one of the major causes of death all over the globe and radiotherapy is considered one of its most effective treatment methods. Designing a radiotherapy treatment plan was done entirely manually in the past. RecentlyIntensity Modulated Radiation Therapy (IMRT) was introduced as a new technology with advanced medical equipmentin the recent years. IMRT provides the opportunity to delive...

متن کامل

Designing Random Allocation Mechanisms

After an earlier draft was circulated, we were informed that Edmonds (1970) has previously shown that the incidence matrix of a bihierarchical constraint structure is totally unimodular. We include our proof for completeness below. We utilize the following result for our proof. ∗University of Chicago Booth School of Business. Email: [email protected]. †Department of Economics, Columb...

متن کامل

ANALYTIC CONTINUATION OF GROUP REPRESENTATIONS - V Robert Hermann

The connection between analytic continuation of group representations and analytic continuation of their matrix elements is discussed, together with some related problems concerning the group-theoretic nature of the S-matrix, and the asymptotic behavior of the special functions of mathematical physics. (To be submitted to Comm. of Math. Phys.) Work supported by U. S. Atomic Energy Commission AN...

متن کامل

|Vub| at BABAR

We report on new measurements of the Cabibbo-Kobayashi-Maskawa matrix elements |Vcb| and |Vub| with inclusive and exclusive semileptonic B decays, highlighting the recent precision measurements with the BABAR detector at the PEP-II asymmetric-energy B Factory at SLAC. Contributed to the Proceedings of Particles and Nuclei International Conference, PANIC05, October 24 28, 2005, Santa Fe, NM USA ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/0710.1962  شماره 

صفحات  -

تاریخ انتشار 2007